Principal component analysis

A technique to find the axes that explain the largest variation in the data.

See also Singular value decomposition.

People

Books

How to

Preprocessing

Excerpt:

  1. sqrt any count features. log any heavy tailed features. PCA prefers things that are “homoscedastic” (which is my favorite word to ASMR and I literally do it in class) sqrt and log are “variance stabilizing transformations”.
  2. localization is noise. regularize when you normalize.
    1. if you make a histogram of a component (or loading) vector and it has really big outliers, that is localization. It’s bad. It means the vector is noise.
    2. diagnostic: https://github.com/karlrohe/LocalizationDiagnostic
    3. To address localization, I would suggest normalizing by regularized row/column sums. This works like fucking magic. Not even kidding. D_r = Diagonal(1/ sqrt(rs + mean(rs)); D_c = Diagonal(1/ sqrt(cs + mean(cs)). Do SVD on D_r A D_c.
      1. paper: Zhang2018understanding
      2. youtube: https://www.youtube.com/watch?v=lOCoa3hYR4Y
  3. and my favorite rule, the Cheshire cat rule - “One day Alice came to a fork in the road and saw a Cheshire cat in a tree. ‘Which road do I take?’ she asked. ‘Where do you want to go?’ was his response. ‘I don’t know,’ Alice answered. ‘Then,’ said the cat, ‘it doesn’t matter.”

Video

Articles

Studies